-
Notifications
You must be signed in to change notification settings - Fork 49
[5.7] Character class and scalar coalescing fixes #588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.7] Character class and scalar coalescing fixes #588
Conversation
29df207
to
b154485
Compare
|
b154485
to
4d89b8d
Compare
This allows us to catch the case where a match occurs without optimizations, but doesn't occur with optimizations. Additionally fix the `xfail` param such that it can't be used on tests that actually match expectations.
Replace a couple of `#if os(Linux)` checks with a check to see if we have a newer stdlib available. This lets us emit an expected failure in the case where we're testing on an older stdlib.
Previously we performed a lexicographic comparison with the bounds of a character class range. However this produced surprising results, and our implementation didn't properly handle case sensitivity. Update the logic to instead only allow single scalar NFC bounds. The input is then converted to NFC in grapheme semantic mode, and checked against the range. In scalar semantic mode, the input scalar is checked on its own. Additionally, fix the case sensitivity handling such that we check both the lowercase and uppercase version of the input against the range.
Previously we would emit a series of scalars written in the DSL as a series of individual characters in grapheme semantic mode. Change the behavior such that we coalesce any adjacent scalars and characters, including those in regex literals and nested concatenations. We then perform grapheme breaking over the result, and can emit character matches for scalars that coalesced into a grapheme. This transform subsumes a similar transform we performed for regex literals when converting them to a DSLTree. This has the nice side effect of allowing us to better preserve scalar syntax in the DSL transform. rdar://96942688
Previously we would only match entire characters. Update to use the generic Character consumer logic that can handle scalar semantic mode. rdar://97209131
In grapheme semantic mode, coalesce adjacent character and scalar members of a custom character class, over which we can perform grapheme breaking. This involves potentially re-writing ranges such that they contain a complete grapheme of adjacent scalars.
Make sure we throw the right error for ranges that are invalid in grapheme mode, but are valid in scalar mode.
I also noticed that `lexQuantifier` could silently eat trivia if it failed to lex a quantification, so also fix that.
4d89b8d
to
7dbe453
Compare
@swift-ci please test |
5.7 cherry-pick of #570 + #574 + #559
Resolves rdar://97386173